82 research outputs found
Comment: Monitoring Networked Applications With Incremental Quantile Estimation
Our comments are in two parts. First, we make some observations regarding the
methodology in Chambers et al. [arXiv:0708.0302]. Second, we briefly describe
another interesting network monitoring problem that arises in the context of
assessing quality of service, such as loss rates and delay distributions, in
packet-switched networks.Comment: Published at http://dx.doi.org/10.1214/088342306000000600 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
SOME FORENSIC ASPECTS OF BALLISTIC IMAGING
Analysis of ballistics evidence (spent cartridge casings and bullets) has been a staple of forensic criminal investigation for almost a century. Computer-assisted databases of images of ballistics evidence have been used since the mid-1980s to help search for potential matches between pieces of evidence. In this article, we draw on the 2008 National Research Council Report Ballistic Imaging to assess the state of ballistic imaging technology. In particular, we discuss the feasibility of creating a national reference ballistic imaging database (RBID) from test-fires of all newly manufactured or imported firearms. A national RBID might aid in using crime scene ballistic evidence to generate investigative leads to a crime gunβs point of sale. We conclude that a national RBID is not feasible at this time, primarily because existing imaging methodologies have insufficient discriminatory power. We also examine the emerging technology of micro- stamping for forensic identification purposes: etching a known identifier on firearm or ammunition parts so that they can be directly read and recovered from crime scene evidence. Microstamping could provide a stronger basis for identification based on ballistic evidence than the status quo, but substantial further research is needed to thoroughly assess its practical viability
Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models
Low-order functional ANOVA (fANOVA) models have been rediscovered in the
machine learning (ML) community under the guise of inherently interpretable
machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and
GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting
functional main effects and second-order interactions. We propose a new
algorithm, called GAMI-Tree, that is similar to EBM, but has a number of
features that lead to better performance. It uses model-based trees as base
learners and incorporates a new interaction filtering method that is better at
capturing the underlying interactions. In addition, our iterative training
method converges to a model with better predictive performance, and the
embedded purification ensures that interactions are hierarchically orthogonal
to main effects. The algorithm does not need extensive tuning, and our
implementation is fast and efficient. We use simulated and real datasets to
compare the performance and interpretability of GAMI-Tree with EBM and
GAMI-Net.Comment: 25 pages plus appendi
Monotone Tree-Based GAMI Models by Adapting XGBoost
Recent papers have used machine learning architecture to fit low-order
functional ANOVA models with main effects and second-order interactions. These
GAMI (GAM + Interaction) models are directly interpretable as the functional
main effects and interactions can be easily plotted and visualized.
Unfortunately, it is not easy to incorporate the monotonicity requirement into
the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013)
and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form
and develops monotone tree-based GAMI
models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is
straightforward to fit a monotone model to using the options in XGBoost.
However, the fitted model is still a black box. We take a different approach:
i) use a filtering technique to determine the important interactions, ii) fit a
monotone XGBoost algorithm with the selected interactions, and finally iii)
parse and purify the results to get a monotone GAMI model. Simulated datasets
are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which
use piecewise constant fits. Note that the monotonicity requirement is for the
full model. Under certain situations, the main effects will also be monotone.
But, as seen in the examples, the interactions will not be monotone.Comment: 12 page
On the Number of Crossings of Empirical Distribution Functions
Let F and G be two continuous distribution functions that cross at a finite number of points β β β€ t1 \u3c β― \u3c tk β€ β. We study the limiting behavior of the number of times the empirical distribution function Gn crosses F and the number of times Gn crosses Fn. It is shown that these variables can be represented, as n β β, as the sum of k independent geometric random variables whose distributions depend on F and G only through Fβ²(ti)/Gβ²(ti), i = 1, β¦, k. The technique involves approximating Fn(t) and Gn(t) locally by Poisson processes and using renewal-theoretic arguments. The implication of the results to an algorithm for determining stochastic dominance in finance is discussed
- β¦